Estimating parameters (\(w\) and \(b\))
Find optimal values of \(w_{\cdot,j}\) and \(b_j\) over all neurons \(j\)
Split data into
Suppose we have
Then the Quadratic Loss Function is defined as follows:
1, For each \(x\in X\), use the residual sum of squares, RSS, as an error measure
\(\begin{eqnarray*}L(w,b|x) &=& \sum_i\frac{1}{2} \left(y_i-\hat{y}_i\right)^2\end{eqnarray*}\)
2, The full quadratic cost function is simply the Mean Squared Error (MSE) used in cross-validation \[\begin{eqnarray} L(w,b) &=& \frac{1}{K} \sum_{k=1}^K L(w,b|x^{(k)})\\ \end{eqnarray}\]
Consider inverted hill-climbing in one dimension \(v\), i.e., we want to find the minimum instead of the maximum.
Consider inverted hill-climbing in one dimension \(v\), i.e., we want to find the minimum instead of the maximum.
Consider inverted hill-climbing in one dimension \(v\), i.e., we want to find the minimum instead of the maximum.
Consider inverted hill-climbing in one dimension \(v\), i.e., we want to find the minimum instead of the maximum.
Consider inverted hill-climbing in one dimension \(v\), i.e., we want to find the minimum instead of the maximum.
\(\eta\) is called the learning rate
Same thing really, but we have to have partial derivatives for each dimension, which makes it look more complicated.
Consider a 2-dimensional case. We will treat each dimension separately
Find the partial derivatives for both dimensions \[\begin{pmatrix} \frac{\partial L(v_1,v_2|x)}{\partial v_1}\\ \frac{\partial L(v_1,v_2|x)}{\partial v_2} \end{pmatrix}\]
Take a resonably long step \(\begin{eqnarray*} \begin{pmatrix} v'_1\\ v'_2\end{pmatrix} &=& \begin{pmatrix}v_1-\eta\frac{\partial L(x,w)}{\partial v_1} \\ v_2-\eta\frac{\partial L(x,v)}{\partial v_2} \end{pmatrix} \end{eqnarray*}\)
(A vector of partial derivatives is called a gradient)
Same thing really, but we have to have partial derivatives for each dimension, which makes it look more complicated.
Consider a 2-dimensional case. We will treat each dimension separately
Find the partial derivatives for both dimensions \[\begin{pmatrix} \frac{\partial L(v_1,v_2|x)}{\partial v_1}\\ \frac{\partial L(v_1,v_2|x)}{\partial v_2} \end{pmatrix}\]
Take a resonably long step \(\begin{eqnarray*} \begin{pmatrix} v'_1\\ v'_2\end{pmatrix} &=& \begin{pmatrix}v_1-\eta\frac{\partial L(x,w)}{\partial v_1} \\ v_2-\eta\frac{\partial L(x,v)}{\partial v_2} \end{pmatrix} \end{eqnarray*}\)
(A vector of partial derivatives is called a gradient)
\[\begin{array}{lllll} i_1 & \Rightarrow z_1 & \Rightarrow a_1 & \Rightarrow z_2 & \Rightarrow a_2 \\ \\ x=i_1 \\ &\Rightarrow i_1 \times w_1 + b_1 = z_1 \\ &&\Rightarrow \sigma(z_1) = a_1 \\ &&&\Rightarrow a_1 \times w_2 + b_2 = z_2 \\ &&&&\Rightarrow \sigma(z_2) = a_2 = \hat{y} \end{array}\]
\(\begin{array}{lllll} \qquad\qquad\; i_1 & \qquad\quad\Rightarrow z_1 & \Rightarrow a_1 & \quad \Rightarrow z_2 & \Rightarrow a_2 \Rightarrow \widehat{y}\\ \\ \end{array}\)
\(i_1 \quad = \quad x\)
\(z_1 \quad = \quad i_1 \times w_1 + b_1\)
\(a_1 \quad = \quad \sigma(z_1)\)
\(z_2 \quad = \quad a_1 \times w_2 + b_2\)
\(\hat{y} = a_2 \quad = \quad \sigma(z_2)\)
\(= \quad 0.05\)
\(= \quad 0.05 \times 0.1 - 0.1\) \(\quad= -0.095\)
\(= \quad \sigma(-0.095)\) \(\quad = 0.476\)
\(= \quad 0.476 \times 0.3 + 0.3\) \(\quad= 0.443\)
\(= \quad \sigma(0.443)\) \(\quad= 0.609\)
\(x \quad = \quad 0.05\)
\(i_1 \quad = \quad 0.05\)
\(z_1 \quad = \quad -0.095\)
\(a_1 \quad = \quad 0.476\)
\(z_2 \quad = \quad 0.443\)
\(a_2 \quad = \quad 0.609\)
\(y = \quad = \quad 0.01\)
Partial derivative w.r.t.:
\(\begin{array}{ccccccccc} w_2:\qquad\qquad\; & \quad & & \quad &\frac{\partial z_2}{\partial w_2} & \times &\frac{\partial a_2}{\partial z_2} &\times& \frac{\partial L(w,b|x)}{\partial a_2} &=& \frac{\partial L(w,b|x)}{\partial w_2} \qquad\qquad\\ \\ \end{array}\)
\(\frac{\partial z_2}{\partial w_2} \quad = \quad \frac{\partial \left(a_1\times w_2 +b_2\right)}{\partial w_2}\) \(\quad = \quad a_1\) \(\quad = \quad 0.476\)
\(\frac{\partial a_2}{\partial z_2} \quad = \quad \frac{\partial \sigma(z_2)}{\partial z_2}\) \(\quad = \quad a_1\left(1-a_1\right)\) \(\quad = \quad 0.601(1-0.601) \quad = \quad 0.238\)
\(\frac{\partial L(w,b|x)}{\partial a_2} \quad = \quad \frac{\partial \frac{1}{2}(y - a_2)^2}{\partial a_2}\) \(\quad = \quad \left(a_2-y\right)\) \(\quad = 0.0599\)
\(\frac{\partial L(w,b|x)}{\partial w_2} \quad = \frac{\partial z_2}{\partial w_2} \times \frac{\partial a_2}{\partial z_2} \times \frac{\partial L(w,b|x)}{\partial a_2}\) \(\quad = 0.476 \times 0.238 \times 0.599 \quad = \quad 0.096\)
\(x \quad = \quad 0.05\)
\(i_1 \quad = \quad 0.05\)
\(z_1 \quad = \quad -0.095\)
\(a_1 \quad = \quad 0.476\)
\(z_2 \quad = \quad 0.443\)
\(a_2 \quad = \quad 0.609\)
\(y = \quad = \quad 0.01\)
Partial derivative w.r.t.:
\(w_2:\qquad\qquad \qquad\qquad \qquad\qquad\; \frac{\partial z_2}{\partial w_2} \times\) \(\frac{\partial a_2}{\partial z_2} \times \frac{\partial L(w,b|x)}{\partial a_2}\) \(= \frac{\partial L(w,b|x)}{\partial w_2}\)
\(b_2:\qquad\qquad \qquad\qquad \qquad\qquad\; \frac{\partial z_2}{\partial b_2} \times\) \(\frac{\partial a_2}{\partial z_2} \times \frac{\partial L(w,b|x)}{\partial a_2}\) \(= \frac{\partial L(w,b|x)}{\partial b_2}\)
\(w_1:\qquad\qquad\qquad\qquad \frac{\partial z_1}{\partial w_1}\) \(\times \frac{\partial a_1}{\partial z_1} \times \frac{\partial z_2}{\partial a_1}\) \(\times \frac{\partial a_2}{\partial z_2} \times \frac{\partial L(w,b|x)}{\partial a_2}\) \(= \frac{\partial L(w,b|x)}{\partial w_1}\)
\(b_1:\qquad\qquad\qquad\qquad\ \frac{\partial z_1}{\partial b_1}\) \(\times \frac{\partial a_1}{\partial z_1} \times \frac{\partial z_2}{\partial a_1}\) \(\times \frac{\partial a_2}{\partial z_2} \times \frac{\partial L(w,b|x)}{\partial a_2}\) \(= \frac{\partial L(w,b|x)}{\partial b_1}\)
\[\begin{eqnarray} L(w,b|x) &=& \frac{1}{2}\sum_i\left(y_i-\hat{y}_i\right)^2\\ L(w,b) &=& \frac{1}{K}\sum_{k=1}^K L(w,b|x^{(k)}) \end{eqnarray}\] - Residual sum of squares (RSS) - Mean squared error (MSE)
However, used in the output layer for regression problems!
(More about pros and cons of different activation functionsin a later lecture)